AITopics

2411.02403

Country:

Asia > South Korea > Seoul > Seoul (0.05)
Asia > Taiwan (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Oyeyemi, Dare Azeez, Ojo, Adebola K.

SMS Spam Detection and Classification to Combat Abuse in Telephone Networks Using Natural Language Processing

arXiv.org Artificial IntelligenceJun-4-2024

In the modern era, mobile phones have become ubiquitous, and Short Message Service (SMS) has grown to become a multi-million-dollar service due to the widespread adoption of mobile devices and the millions of people who use SMS daily. However, SMS spam has also become a pervasive problem that endangers users' privacy and security through phishing and fraud. Despite numerous spam filtering techniques, there is still a need for a more effective solution to address this problem [1]. This research addresses the pervasive issue of SMS spam, which poses threats to users' privacy and security. Despite existing spam filtering techniques, the high false-positive rate persists as a challenge. The study introduces a novel approach utilizing Natural Language Processing (NLP) and machine learning models, particularly BERT (Bidirectional Encoder Representations from Transformers), for SMS spam detection and classification. Data preprocessing techniques, such as stop word removal and tokenization, are applied, along with feature extraction using BERT. Machine learning models, including SVM, Logistic Regression, Naive Bayes, Gradient Boosting, and Random Forest, are integrated with BERT for differentiating spam from ham messages. Evaluation results revealed that the Na\"ive Bayes classifier + BERT model achieves the highest accuracy at 97.31% with the fastest execution time of 0.3 seconds on the test dataset. This approach demonstrates a notable enhancement in spam detection efficiency and a low false-positive rate. The developed model presents a valuable solution to combat SMS spam, ensuring faster and more accurate detection. This model not only safeguards users' privacy but also assists network providers in effectively identifying and blocking SMS spam messages.

dataset, sms message, spam message, (11 more...)

doi: 10.9734/JAMCS/2023/v38i101832

2406.06578

Country:

Asia > Singapore (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
Africa > Nigeria > Oyo State > Ibadan (0.04)

Genre: Research Report > New Finding (0.35)

Industry:

Telecommunications (1.00)
Information Technology > Security & Privacy (0.74)
Information Technology > Networks (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.71)

arXiv.org Artificial IntelligenceApr-15-2024

SpamDam: Towards Privacy-Preserving and Adversary-Resistant SMS Spam Detection

Li, Yekai, Zhang, Rufan, Rong, Wenxin, Mi, Xianghang

In this study, we introduce SpamDam, a SMS spam detection framework designed to overcome key challenges in detecting and understanding SMS spam, such as the lack of public SMS spam datasets, increasing privacy concerns of collecting SMS data, and the need for adversary-resistant detection models. SpamDam comprises four innovative modules: an SMS spam radar that identifies spam messages from online social networks(OSNs); an SMS spam inspector for statistical analysis; SMS spam detectors(SSDs) that enable both central training and federated learning; and an SSD analyzer that evaluates model resistance against adversaries in realistic scenarios. Leveraging SpamDam, we have compiled over 76K SMS spam messages from Twitter and Weibo between 2018 and 2023, forming the largest dataset of its kind. This dataset has enabled new insights into recent spam campaigns and the training of high-performing binary and multi-label classifiers for spam detection. Furthermore, effectiveness of federated learning has been well demonstrated to enable privacy-preserving SMS spam detection. Additionally, we have rigorously tested the adversarial robustness of SMS spam detection models, introducing the novel reverse backdoor attack, which has shown effectiveness and stealthiness in practical tests.

dataset, sms spam message, spam message, (14 more...)

2404.09481

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Africa > Nigeria > Jigawa State > Dutse (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy > Spam Filtering (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(2 more...)

Sheu, Guang-Yih, Liu, Nai-Ru

Sampling Audit Evidence Using a Naive Bayes Classifier

arXiv.org Artificial IntelligenceMar-20-2024

Taiwan's auditors have suffered from processing excessive audit data, including drawing audit evidence. This study advances sampling techniques by integrating machine learning with sampling. This machine learning integration helps avoid sampling bias, keep randomness and variability, and target risker samples. We first classify data using a Naive Bayes classifier into some classes. Next, a user-based, item-based, or hybrid approach is employed to draw audit evidence. The representativeness index is the primary metric for measuring its representativeness. The user-based approach samples data symmetric around the median of a class as audit evidence. It may be equivalent to a combination of monetary and variable samplings. The item-based approach represents asymmetric sampling based on posterior probabilities for obtaining risky samples as audit evidence. It may be identical to a combination of non-statistical and monetary samplings. Auditors can hybridize those user-based and item-based approaches to balance representativeness and riskiness in selecting audit evidence. Three experiments show that sampling using machine learning integration has the benefits of drawing unbiased samples, handling complex patterns, correlations, and unstructured data, and improving efficiency in sampling big data. However, the limitations are the classification accuracy output by machine learning algorithms and the range of prior probabilities.

algorithm, audit evidence, equation, (14 more...)

2403.14069

Country:

Asia > Taiwan (0.25)
North America > United States > New York (0.05)
North America > Panama (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

#artificialintelligenceAug-3-2021, 18:50:41 GMT

NLP Techniques being Helpful for Spam Detection

NLP techniques are used to train data to detect Spam. In today's multimedia-driven world, we're gathering information and connecting with people has become extremely easy due to social media and the internet. Due to which we get hundreds of messages and emails daily out of which many of them are unwanted. These unwanted messages are called spam and the useful ones are called ham mails. Today we shall see how spam filtration with Natural Language Processing (NLP) is implemented on data to get classified data to train our models to detect spam messages.

nlp technique, spam detection, spam message, (2 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

#artificialintelligenceJun-10-2021, 18:25:56 GMT

What role does AI play in cybersecurity?

Many believe that cybersecurity is an exciting field to work in, and indeed it is. Yet being responsible for an organization's IT Security is no easy feat. Attackers always seem to be a few steps ahead of defenders. It often feels like a game of one against many – from petty criminals to nation-states. It would be highly advantageous if our cybersecurity tools could automatically adapt to these threats.

cybersecurity, incident response and security team, training data, (14 more...)

Country:

North America > Canada > Alberta (0.15)
Europe > Switzerland > Zürich > Zürich (0.05)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.82)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.54)
Health & Medicine > Therapeutic Area > Immunology (0.53)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

#artificialintelligenceApr-21-2021, 20:17:08 GMT

Spam Email Detection Using Machine Learning

There are 4,825 ham and 747 spam messages. This indicates the data is imbalanced which needs to be fixed. The top ham message is "Sorry, I'll call later", whereas the top spam message is "Please call our customer service…" which occurred 30 and 4 times, respectively. First, let's create a separate dataframe for ham and spam messages and convert it to NumPy array and then to a list to generate WordCloud later. Since it is a text data, there are many unnecessary stopwords like articles, prepositions etc., which needs to be removed from the data.

architecture, ham message, spam message, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceNov-30-2020, 08:18:14 GMT

Understanding Naïve Bayes and Support Vector Machine and their implementation in Python

This article was published as a part of the Data Science Blogathon. In this digital world, spam is the most troublesome challenge that everyone is facing. Sending spam messages to people causes various problems that may, in turn, cause economic losses. By spamming messages, we lose memory space, computing power, and speed. To remove these spam messages, we need to spend our time.

bayes and support vector machine, importing package, posterior probability, (11 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.90)

#artificialintelligenceSep-25-2020, 20:02:02 GMT

Top 20 Dataset in Machine Learning

To build a machine learning model dataset is one of the main parts. Before we start with any algorithm we need to have a proper understanding of the data. These machine learning datasets are basically used for research purposes. Most of the datasets are homogeneous in nature. We use a dataset to train and evaluate our model and it plays a very vital role in the whole process. If our dataset is structured, less noisy, and properly cleaned then our model will give good accuracy on the evaluation time. Imagenet dataset is made by the group of researchers and the images in the dataset organized according to the WordNet hierarchy. This dataset can be used for machine learning purposes and computer vision research fields as well.

artificial intelligence, machine learning, natural language, (16 more...)

Country:

North America > United States > Oklahoma > Payne County > Cushing (0.05)
North America > United States > Wisconsin (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(2 more...)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.46)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

#artificialintelligenceJun-5-2020, 09:56:39 GMT

How Machine Learning in Search Works: Everything You Need to Know

In the world of SEO, it's important to understand the system you're optimizing for. Another crucial area to understand is machine learning. Now, the term "machine learning" gets thrown around a lot these days. But how does machine learning actually impact search and SEO? This chapter will explore everything you need to know about how search engines use machine learning.

information retrieval, machine learning, natural language, (13 more...)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.62)